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Quantile regression is often used when a comprehensive relationship between a response variable 
and one or more explanatory variables is desired. The traditional frequentists’ approach to 
quantile regression has been well developed around asymptotic theories and efficient algorithms. 
However, not much work has been published under the Bayesian framework. One challenging 
problem for Bayesian quantile regression is that the full likelihood has no parametric forms. In 
this paper, we propose a Bayesian quantile regression method, the linearly interpolated density 
(LID) method, which uses a linear interpolation of the quantiles to approximate the likelihood. 
Unlike most of the existing methods that aim at tackling one quantile at a time, our proposed 
method estimates the joint posterior distribution of multiple quantiles, leading to higher global 
efficiency for all quantiles of interest. Markov chain Monte Carlo algorithms are developed to 
carry out the proposed method. We provide convergence results that justify both the algorithmic 
convergence and statistical approximations to an integrated-likelihood-based posterior. From 
the simulation results, we verily that LID has a clear advantage over other existing methods in 
estimating quantities that relate to two or more quantiles. 

Keywords: Bayesian inference; linear interpolation; Markov chain Monte Carlo; quantile 
regression 


1. Introduction 

Quantile regression, as a supplement to the mean regression, is often used when a com¬ 
prehensive relationship between the response variable y and the explanatory variables x 
is desired. Consider the following linear model: 

yi = xj 13-\-Ei, i = l,2,...,n, (1.1) 

where yi is the response variable, Xi is a p x 1 vector consisting of p explanatory variables, 
/3 is a p X 1 vector of coefficients for the explanatory variables, and £i is the error term. 
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The quantile regression analysis models the rth conditional quantile of y given x as: 

Qyi{T\xi) = xJ I3{t), i = l,2,...,n, (1.2) 

which is equivalent to (1.1) with QeiiT\xi) = 0. The r-specific coefficient vector /3(r) can 
be estimated by minimizing the loss function: 

n 

mm^Pr(yi-a:f/3(r)), (1.3) 

^ i—1 

where Priu) = ur if m > 0, and Priu) = u{t — 1) if u < 0; see Koenker [6]. 

To make inference on the quantile regression, one could use the asymptotic normal 
distribution of the estimates or use the bootstrap method. Aside from the regular boot¬ 
strap such as the residual bootstrap and the {x, y) bootstrap, one could also use Parzen, 
Wei and Ying [10] ’s method or the Markov chain marginal bootstrap method (He and 
Hu [5]). 

In contrast to the rich literature on quantile regression with the frequentist view, not 
much work has been done under the Bayesian framework. The most challenging problem 
for Bayesian quantile regression is that the likelihood is usually not available unless the 
conditional distribution for the error is assumed. 

Yu and Moyeed [17] proposed an idea of employing a likelihood function based on the 
asymmetric Laplace distribution. In their work, Yu and Moyeed assumed that the error 
term follows an independent asymmetric Laplace distribution 

/r('u)=r(l-T)e“'’"(“^ uGR, (1.4) 

where pr (u) is the loss function of quantile regression. The asymmetric Laplace distribu¬ 
tion is very closely related to quantile regression since the mode of fr{u) is the solution 
to (1.3). Reich, Bondell and Wang [11] developed a Bayesian approach for quantile re¬ 
gression assuming that the error term follows an infinite mixture of Gaussian densities 
and their prior for the residual density is stochastically centered on the asymmetric 
Laplace distribution. Kottas and Gelfand [7] implemented a Bayesian median regression 
by introducing two families of distributions with median zero and the Dirichlet process 
prior. Dunson and Taylor [4] used a substitution likelihood proposed by Lavine [9] to 
make inferences based on the posterior distribution. One property of Dunson and Tay¬ 
lor’s method is that it allows regression on multiple quantiles simultaneously. Tokdar and 
Kadane [15] proposed a semiparametric Bayesian approach for simultaneous analysis of 
quantile regression models based on the observation that when there is only a univariate 
covariate, the monotonicity constraint can be satisfied by interpolating two monotone 
curves, and the Bayesian inference can be carried out by specifying a prior on the two 
monotone curves. Taddy and Kottas [14] developed a fully nonparametric model-based 
quantile regression based on Dirichlet process mixing. Kottas and Krnjajic [8] extended 
this idea to the case where the error distribution changes nonparametrically with the 
covariates. Recently, Yang and He [16] proposed a Bayesian empirical likelihood method 
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which targets on estimating multiple quantiles simultaneously, and justified the validity 
of the posterior based inference. 

In this paper, we propose a Bayesian method, which aims at estimating the joint 
posterior distribution of multiple quantiles and achieving “global” efficiency for quan¬ 
tiles of interest. We consider a Bayesian approach to estimating multiple quantiles as 
follows. Let Ti,. ..,Tm be m quantiles in model (1.2) and Bm = (/3(ti), ... ,/3{Tm))- Let 
X = (a:i,..., Xn)' and Y = (yi,..., pn) be the observations of size n. For each pair of ob¬ 
servation (xi,yi), the likelihood L{Bm\xi,yi) = p{yi\xi, Bm) is not available. However if 
we include fi, the probability density function (pdf) of the conditional distribution y\xi, 
as the nuisance parameter, then the likelihood L{Bm, fi\xi,yi) = p{yi\xi, Bm, fi) = fiiVi)- 
This is to treat Bayesian quantile regression as a semi-parametric problem: the parameter 
of interest is finite dimensional and the nuisance parameter is nonparametric. To elimi¬ 
nate the nuisance parameter, we use the integrated likelihood methods recommended by 
Berger, Liseo and Wolpert [1]. More specifically, let 0/^ be all the quantiles of fi, and 
dm,i = XiBm be the m quantiles of interest. We can define p{yi\xi, Bm) as 

p{y^\xi,Bm)= [ p{yi\9f.)dlle„,{fi), (1.5) 

where denotes the subset of well-behaved pdfs (will be defined precisely in Sec¬ 

tion 3.2) with those m quantiles equal to 9m,i, nemi(‘) denotes the prior on fi\0m,i £ 
(will be specified in Section 3.2), and p{yi\6fi) = fi{yi) because fi{jj\xi) is de¬ 
termined by the conditional quantile functions. Here, p{yi\xi,Bm) can be viewed as an 
integral of a function or an expectation with the densities as the random variable. The 
posterior distribution of Bm\X,Y can be written as 

p{Bm\X, Y) OC TTm{Bm\X)L{Y\X, Bm), (1-6) 

where ■Km{Bm\X) is the prior on Bm and L{Y\X,Bm) = ]Xi=iP{yi\^i^Bm)- 

One practical difficulty with the above approach is that the integration step to remove 
the nuisance parameter is computationally infeasible except for the case of m = 1 (Doss 
[3]). To circumvent this issue, we consider a different approximation to the likelihood. 
Note that XiBm gives the m quantiles of the conditional distribution y\xi based on model 
(1.2). These m quantiles can be used to construct an approximate conditional distribu¬ 
tion y\xi through linear interpolation. With this approximate likelihood, an approximate 
posterior distribution becomes available. We show that the total variation distance be¬ 
tween the approximate posterior distribution and p[Bm\X,Y) (the posterior based on 
the integrated likelihood) goes to 0 as n,..., becomes dense in (0,1) as to —>■ oo. A 
Markov chain Monte Carlo (MCMC) algorithm can then be developed to sample from the 
approximate posterior distribution. The recent work of Reich, Fuentes and Dunson [12] 
used large-sample approximations to the likelihood to do Bayesian quantile regression. 
Their approach also aims to achieve global efficiency over multiple quantiles, and can 
adapt to account for spatial correlation. In contrast, our work uses approximations at a 
hxed sample size n and provides a Bayesian interpretation of the posterior quantities. 
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The rest of the paper is organized as follows. Section 2 introduces the proposed method. 
Section 3 provides the convergence property of the algorithm as well as the approximate 
posterior distribution. Section 4 compares the proposed method with some existing meth¬ 
ods through simulation studies and applies the proposed method to real data. Section 5 
provides concluding remarks. 


2. Methodology 

In this section, we describe the linearly interpolated density to be used in approximating 
the likelihood, and then give the layout of our MCMC algorithm for posterior inference. 
We list again the basic setting introduced in Section 1. Let X = (xi,... ,a;„)' and Y = 
(yi,..., Un) be the observations. Let ri,..., be m quantiles in model (1.2) and Bm, = 
(/3(ti), ... ,fi{Tm))- We are interested in the posterior distribution Bm\X,Y. 


2.1. Linearly interpolated density 


The likelihood is generally not assumed under the quantile regression model, but XiBm 
gives the m quantiles of the conditional distribution y\xi. With the linearly interpolated 
density based on the m quantiles, we can approximate the true likelihood from a sequence 
of specified quantile functions. 

Here is how the linear interpolation idea works in a simple setting. Suppose Z ^ F[z), 
where F{z) is the cumulative distribution function (cdf) of Z. Let f{z) be the pdf of 
Z. Let Tz = F{z), and ti,T 2 be two constants such that 0 < ti < < T 2 < 1. Then 

< z < F~^{t 2 ) if f{z) is continuous and non-zero on the support of Z. We can 
approximate f{z) by 


F-^iT2)-F-^{T,y 

because 

_ ^2 ~ Tl _ _ _ T2 — Ti _ _ f(y*\ (0 9i 

where n < r* < T 2 and z* = F-^{t 2 )). 

Now we extend the interpolation idea to model (1.2). Given Bm = I3(t2), ■ ■ ■, 

id(Tm)), we could calculate the linearly interpolated density fi{yi\xi,Bm), i = l,2,...,n, 

by 


f^{yi\x^,Bm) 


~m— 1 

-i=i 


_Li-n L?_ 

Xil3{Tj+i) - XiP{Tj) 


—oo,Xi^(Ti))}H /l (j/i) + ^{yiG{xiP{Tm),oo)}{^ 


(2.3) 

Tm)h(.yi), 
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where /i is distributed as the left half of N{xiP{Ti),a^), /2 is distributed as the right 
half of N(xi/3{Tm),cr'^), and is some pre-specified parameter. 

Let pm,{Y\X, Bjn) = nr=idenote the approximate likelihood. One possi¬ 
ble prior TTmiBmlX) on B^ is a truncated normal 7V(^,S) satisfying 

XiP{Ti) <XiP{T 2 ) <■■■ <x^l3{Tm), i = l,2,...,n. (2.4) 


Since we include the intercept in model (1.2), the first element of Xi is 1, and at least 
the parallel quantile regression lines satisfy (2.4). The corresponding posterior is 


Pm{Bm\X,Y) 


Trm{Bjn\X)pm{Y\X,Bm) 

PmiY\X) 


(2.5) 


where Pm{Y\X) = f 'Km{Bm\X)pm(Y\X, Bm) dBm - In the next section, we give a MCMC 
algorithm to sample Bm from this posterior. We show later that the total variation 
distance between this posterior distribution and the target posterior p(Bm|^, L") goes to 
0 as m goes to infinity. 


2.2. Algorithm of the linearly interpolated density (LID) method 

We incorporate the linearly interpolated density into the following modified Metropolis- 
Hastings algorithm to draw samples from pm{,Bm\X,Y). 

1. Choose an initial value B'^ for Bm. One good choice is to use the parallel quantile 
estimates, that is, all the slopes for the quantiles are the same and the intercepts 
are different. We could use the quantreg (a function in R) estimates of the slopes for 
the median as the initial slopes, and use the quantreg estimates of the intercepts for 
each quantile as the initial intercepts. In case a lower quantile has a larger intercept 
than an upper quantile, we could order the intercepts such that the intercepts 
increase with respect to r. If there are ties, we could add an increasing sequence 
with respect to r to the intercepts to distinguish them. Another possible choice for 
the initial value is to use Bondell, Reich and Wang [2]’s estimate which guarantees 
the non-crossing of the quantiles. 

2. Approximate the densities. With the initial values of the parameters, we can calcu¬ 
late the linearly interpolated density fi{yi\xi, B^), * = 1,2,..., n, by plugging B^ 
into equation (2.3). Let = nr=i 

3. Propose a move. Suppose we are at the kth iteration. Randomly pick a number Tj 

from Ti,T 2 ,...,Tm and then randomly pick a component /Ij^“^(rj) of /3 ^“^(tj) to 
update. To make sure that the proposed point /3 ;*(tj) satisfies constraint (2.4), we 
can calculate a lower bound Ijy and an upper bound Uj^i for l3*(Tj) and generate 
a value for l3*{Tj) from Uniform(/j^,In case Ijy = —oo or Ujy = oo, we will 
use a truncated normal as the proposal distribution. The details on how to find the 
bounds are in Appendix A.l. Denote /3 *(tj) as the updated by replacing 

its Ith. component pf~^{Tj) by the proposed value /3*(Tj). 
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4. SetB^ = {P’^ 1(t,_i),/3*(tj),/3'= 1(t,+i), ... We can cal¬ 

culate the linearly interpolated density f*{yi\xi,B^), i = 1,2,by plugging 
B^ into equation (2.3). Let L* f*{yi\xi, B^). 

5. Calculate the acceptance probability 


r = min 



7rm{B*jX)L*q{B*^^Bi-^) \ 

TTm{Bt^\X)L'^-\{Bic^^B^)) 


( 2 . 6 ) 


where q{B^^ —)► B^) denotes the transition probability from B'^^ to B^. Notice 
that these two transition probabilities cancel out if we choose symmetric proposals. 
Let BI^ = B^ with probability r, and B^ = BI^^ with probability 1 — r. If B^ = 
B^, then = L*-, otherwise = L^~^. 

6. Repeat steps 3-5 until the desired number of iterations is reached. 


3. Theoretical properties 

In this section, we give the stationary distribution of the Markov chain in Section 2.2 for 
fixed m, and study the limiting behavior of the stationary distribution as m —^ oo. 

3.1. Stationary distribution 

Since we replace the true probability density function by the linearly interpolated den¬ 
sity in the Metropolis-Hastings algorithm in Section 2.2, it is not obvious what the 
stationary distribution of the Markov chain is. The following theorem, whose proof is in 
Appendix A.2, says that the Markov chain converges to prn{Bm\X,Y) defined in (2.5). 

Theorem 3.1. The stationary distribution of the Markov chain constructed in Sec¬ 
tion 2.2 is Prn{Bm\X,Y). 

This theorem implies that we can use the algorithm in Section 2.2 to draw samples 
from pm(B„|X,y). 

3.2. Limiting distribution 

In this section, we show that as m —>■ oo, the total variation distance between the station¬ 
ary distribution pm{Bm\X,Y) and the target distribution p{Bm\X,Y) (defined in (1.6)) 
goes to 0. The proof requires the following assumption about fi, the probability density 
function of the conditional distribution y\xi. All the results are stated for a given sample 
size n. 

Assumption 3.1. Let qf^r he the rth quantile of f, and Mi, M 2 and c be constants. 
The densities of y\xi are in the set ^ = {/| f f dx = 1,0 < f < Mi, |/'| < M 2 , and f{x) < 
c/ y/m for X < qpijm and for x > qf,(m-i)/m,'m = 2,3,...}. 
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The assumption implies that ^ is a set of bounded probability density functions with 
bounded first derivatives and controlled tails. The restrictions on the tails are not hard to 
satisfy. The Cauchy distribution, for example, is in the set. For the Cauchy distribution, 
the Mh quantile is q^/m = tan(7t(i - i)) = -ctan(^), so /(gi/^) = = 

i sin^(^) = O(^) < for some c. The set ^ appeared in (1.5) denotes the subset 
of ^ that contains all the pdfs with those m quantiles equal to 9m,i = XiBm- 

We now specify the prior on fi{-\xi) £ denoted by n(/i), and the prior on fi\6m,i £ 
denoted by .(/i). We know from (1.2) that the rth quantile of fi{-\xi), the 
conditional distribution of y given x = Xi, is xfj3(T). Let us consider /3(r) as a function 
of T, where 0 < r < 1. Because xj /3(t), 0 < r < 1, determines all the quantiles of fi{-\xi) 
based on (1.2), and therefore determines fi{-\xi) (Koenker [6]), the prior on fi{-\xi) 
can be induced from the prior on /3(t). To satisfy Assumption 3.1, we use a Gaussian 
process prior on /3"(r) so that /3(t) has the second derivative, and then /^’s have the 
first derivative. The prior n(/i) on fi{-\xi) is induced from the prior on ,5(t). The prior 
n6»m,i(/i) on fi\6m,i is iuduced by The prior on Bm can be obtained from the prior 

on /3(r), because Bm is a vector of m points on /3(r). With the specihcation of these 
priors, p{yi\xi, Bm) and p{Bm\X,Y) given in (1.5) and (1.6) are well-defined. 

To study the limiting distribution as m —?► oo, we assume the sequence of quantile levels 
satishes the following condition: 

= max (tj+1 - Tj) = 0( — ), (3.1) 

0<j<m \m / 

where tq = 0 and Tm+i = 1- This condition is not difficult to satisfy. For example, we 
can start from mo = Mg quantile levels: r = Mq+i ’' •' ’ a^+i ’ '"^hich include the 

quantiles of interest. We add new r’s one by one so that the new r divides one of the pre¬ 
vious intervals in halves, that is, t = 2(Mo+i) ' 2(Mo+i) ^ ■ ■ ■ > 2 (ZlX\) ^ a{m\+i) > 4(Mo+i) ^ ^ 

4 '(Mo+i) sequence of quantiles, we have At = maxo<j<m,(Tj+i — Tj) < 

- = o(^). 

To prove the convergence of distributions, we use the total variation norm, \\pi — 
M 2 ||tv = sup^|pi(A) — /i 2 (A)| for two probability measures pi and where A de¬ 
notes any measurable set. It is more convenient to use the following equivalent defi¬ 
nition (Robert and Casella [13], page 253): jj^i — P 2 \\ty = 5 Sup|^|<i [/ h{x)yii{dLx) — 
f h(x)p, 2 {dx)\. The following theorem gives the limiting distribution of the stationary 
distribution as m —>■ oo. 


Theorem 3.2. 


0 (^)- 


\\Pm{Bm\X,Y) — p{Bm\X,Y)\\Tv ^ 0 flsm—>-oo, assuming Tj+i — Tj = 


The proof is in Appendix A.3. As a consequence of Theorem 3.2, we have the following 
corollary. 

Corollary 3.1. Let iq be the quantiles of interest, which is contained in Bm- We have 
\\pm{ri\X,Y)-p{r]\X,Y)\\TY^Q asm^oo, assuming Tj+i - Tj = 
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The above corollary says that by the linearly interpolated density approximation the 
posterior distribution of the quantiles of interest converges to the target distribution. The 
theorem requires that we need to increase m in the algorithm. Although m is fixed in 
applications, the convergence result lends support to Pm{Bm\X,Y) as an approximation. 


4. Comparison of LID with other methods 


In this section, we compare the proposed method with some existing methods through 
three simulation studies. In the quantile regression model (1.2), if the conditional densi¬ 
ties fi{y\xi) are different for different observation i, one could apply weighted quantile 
regression to improve the efficiency of estimates (Koenker [6], page 160). In this case, the 
loss function would be: 

n 

xf (4.1) 

.-—I 


where Wi denotes the weight for the fth observation. The optimal weight is the conditional 
density fi{y\xi) at the rth quantile. Because the density is not available generally, one 
could approximate the density by a nonparametric density estimate. One simple way is 
to use 


2Ar 

xf + At) — — Ar)) ’ 


f = l,2,...,n, 


(4.2) 


where denotes the unweighted quantile regression estimate. When the weight is nega¬ 
tive due to crossing of quantile estimates, we just set the weight to be 0. This occurs with 
probability tending to 0 as n increases. To make inference, one could use the asymptotic 
normal distribution of the estimates or use the bootstrap method. 


4.1. Example 1 

The data were generated from the following model 


yi = a + bxi +{1 + Xi)ei, i = l,2,...,n, (4.3) 

where e^’s are independent and identically distributed (i.i.d.) as A(0,1). We chose 
n= 100, a = 5 and b = 1. The covariate Xi was generated from lognormal(0,1). The 
corresponding quantiles of interest are 

1 TTi 

Qyi{,T\xi)=a{T)+b{T)xi, i = l,2,...,n,T=-- —r- (4.4) 

Here we report the results on the 0.25, 0.5 and 0.75 quantiles and the difference between 
the 0.75 and 0.5 quantiles by comparing the mean squared error (MSE) for the slope 
estimates from hve different methods: the proposed linearly interpolated density method 
(LID), the regular regression of quantiles (RQ), the weighted RQ with estimated weights 
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Methods 

5(0.25) 

6(0.5) 

6(0.75) 

6(0.75) -6(0.5) 

RQ 

23 (4) 

19 (2) 

19 (3) 

15 (2) 

EWRQ 

16 (2) 

13 (2) 

15 (3) 

11 (2) 

LID 

22 (4) 

15 (2) 

13 (1) 

3 (0.6) 

Yu and Moyeed 

21 (4) 

17 (2) 

16 (3) 

10(1) 

Reich et al. 

16 (2) 

15 (2) 

23 (3) 

11 (1) 


(EWRQ) (Koenker [6]), the pseudo-Bayesian method of Yu and Moyeed [17], and the 
approximate Bayesian method of Reich, Fuentes and Dunson [12]. We generated 100 data 
sets for computing the MSE. 

For LID and Yu and Moyeed’s method, we used the normal prior 1V(0,100) for each 
parameter air) and &(t). For LID, we chose m = 49, equally spaced quantiles between 0 
and 1 (which include the quantiles of interest: 0.25, 0.5 and 0.75), and the length of the 
Markov chain is 1 000 000 (half of the samples were used as burn-in). We ran such a long 
chain because we updated 98 parameters one at a time, which means we updated each 
parameter about 10 000 times on average. Every thousandth sample in the chain is taken 
for the posterior inference. For Yu and Moyeed’s method, a Markov chain with length 
5 000 (half of the samples were used as burn-in) seems enough for the inference, partially 
because Yu and Moyeed’s method is dealing with one quantile at a time and has only 
two parameters. For Reich et al.’s method, we simply used their code and set the length 
of the chain to be 2 000 (half of the samples were used as burn-in). Notice that for LID 
and Reich et al.’s method, only one run is needed to provide all results in the table, and 
other methods have to run for each r. 

From the results in Table 1, we can see that LID did better than RQ and Yu and Moy¬ 
eed’s method. Comparing with weighted RQ and Reich et al.’s method, LID gave better 
estimates for upper quantiles but poorer estimates for lower quantiles. For estimating 
the differences of quantiles, LID is clearly the best among all the methods. 


4.2. Example 2 

The data were generated from the following model 


yi = a^-hxi^i+cx 2 ,i + {l + xi^i + X 2 ,i)ei, i = l,2,...,n, (4.5) 

where e^’s are i.i.d. from JV(0, 1). In the simulations, we chose n = 100, a = 5, 5=1, and 
c = 1. The covariates xi^i was generated from lognormal(0,1) and X 2 ,i was generated 
from Bernoulli(0.5). The corresponding quantiles of interest are 


i 


l,2,...,n,T = 


1 

I 7; ■ ■ ■ : 

m -I- 1 


m 

m -I- 1 


= o-(t) + 5(r)a:i,i -I- c{T)x2,i, 


(4.6) 
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Table 2. n X MSE and its standard error (in parenthesis) for Example 2 


Methods 

&(0.5) 

6(0.75) 

6(0.75)-6(0.5) 

c(0.5) 

c(0.75) 

c(0.75) -c(0.5) 

RQ 

22 (3) 

25 (3) 

20 (3) 

47 (9) 

52 (7) 

42 (6) 

EWRQ 

15 (2) 

19 (3) 

16 (2) 

46 (8) 

49 (8) 

40 (6) 

LID 

17 (2) 

18 (2) 

2.9 (0.4) 

36 (5) 

42 (6) 

18 (2) 

Yu and Moyeed 

20 (2) 

21 (3) 

13 (2) 

42 (7) 

45 (6) 

28 (4) 

Reich et al. 

20 (3) 

29 (5) 

11 (1) 

4.2 (0.6) 

8.6 (1.1) 

3.1 (0.3) 


We compared the five methods with the same performance criterion as Example 1. We 
generated 400 data sets for computing the MSE. The results are in Table 2. We see that 
for the quantile estimates, LID (with m = 15) and EWRQ perform similarly, and LID 
outperforms RQ and Yu and Moyeed’s method. For estimating the difference between 
quantiles, LID outperforms RQ, EWRQ, and Yu and Moyeed’s method. Comparing with 
Reich et al.’s method, LID gave better estimates for parameter b but poorer estimates 
for parameter c. 

From the two simulation studies, we can see that most of the time the proposed LID 
method works as well as the weighted RQ, and outperforms RQ and Yu and Moyeed’s 
method, for estimating quantiles. LID performs better than Reich et al.’s method in 
some cases and is outperformed by Reich et al.’s method in others. LID has a significant 
advantage over other methods in estimating the difference of quantiles. When several 
quantiles are of interest, including their differences, there is a clear efficiency gain in 
using LID. 

4.3. Empirical studies 

In this section, we look at the June 1997 Detailed Natality Data published by the National 
Center for Health Statistics. Following the analysis in Koenker ([ 6 ], page 20), we use 
65 536 cases of recorded singleton births. We consider the following quantile model for 
the birth weight data: 

= «('^) + + c{T)xi^2 + d{T)xi^3 + e{T)xi^4, * = 1,2,... ,n, (4.7) 

where Xi^i is the indicator function that indicates whether the mother went to prenatal 
care for at least two times, Xi ^2 is the indicator function that indicates whether the 
mother smoked or not, Xi^s is mother’s weight gain in pounds during pregnancy, and Xi ^4 
is the square of mother’s weight gain. The mother’s weight gain enters the model as a 
quadratic following the discussion in Koenker ([ 6 ], page 23). To make the results more 
comparable, we consider a slight modification of model (4.7): 

QyiiT\xi) =a('r) +b{T)xi^i + c(T)xi,2 +d*(r)x*3 +e*(T)x*4, i = 1,2, ...,n, (4.8) 

where x* ^ denotes the standardized mother’s weight gain during pregnancy and 1*4 
denotes the standardized square of mother’s weight gain. We compared the results from 
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Table 3. Estimates of the parameters and their standard errors (in parentheses) for the birth 
weight data 


Methods 6(0.1) 

c(O.l) 

d*(0.1) 

e*(0.1) 

b(0.25) 

c(0.25) 

d*(0.25) 

£*(0.25)5(0.5) 

c(0.5) 

d*(0.5) 

£*(0.5) 

RQ 

-0.030 

-0.22 

0.37 

-0.21 

-0.049 

-0.22 

0.19 

-0.075 -0.061 

-0.22 

0.127 - 

-0.020 


(0.009) 

(0.01) 

(0.02) 

(0.02) 

(0.008) 

(0.008) 

(0.011) 

(0.012) (0.006) 

(0.007) 

) (0.008) 

(0.008) 

LID 

-0.045 

-0.22 

0.36 

-0.22 

-0.052 

-0.23 

0.20 

-0.081 -0.061 

-0.23 

0.131 - 

-0.026 


(0.007) 

(0.003) 

1 (0.002) 

(0.003) 

1 (0.001) 

(0.002) 

(0.008) 

(0.007) (0.003) 

(0.003) 

) (0.002) 

(0.002) 


RQ and LID (with m = 39) for the full data set. Here we focus on the 0.1, 0.25, and 
0.5 quantiles. The results are in Table 3. From the results, we can see that the estimates 
from both methods are very close. The standard error from LID seems to be smaller than 
that from RQ. 

To see how good the estimates are, we compared the estimated conditional quantile 
with the local quantile estimated nonparametrically. We considered two subsets of the full 
data. For the first subset of the data, we selected = 1, Xi ^2 = 1, and 24.5 < < 25.5, 

within which range there are 96 observations. For the second subset of the data, we 
selected xyi = 1, Xy 2 = 0, and 44.5 < Xi^s < 45.5, within which range there are 1318 
observations. Then we calculated the quantile of pi in each subset of the data as the local 
quantile, and compared it with the predicted quantiles from RQ and LID. The results 
are presented in Table 4. From the results, we can see that all the estimated quantiles 
are very close to the local quantile estimates. 

Another way to check the model fitness is to build the model by leaving out a portion 
of the data, and then evaluate the model performance on the out-of-bag portion of the 
data. Here we compared the out-of-bag quantile coverage (the percentage of the testing 
data that fall below the rth quantile line) by randomly selecting 10% of the data as the 
out-of-bag testing data and using the rest as the training data. The results based on a 
random splitting are summarized in Table 5. We can see that both RQ and LID have 
coverages similar to the nominal values. 

From this example we can see that the model parameter estimates, including the 
quantiles, from both RQ and LID are very similar, but LID estimates are associated 
with lower standard errors, which corroborates our findings in simulation studies. 


Table 4. Estimates of the local quantile 


Quantile 

Xi.l = 1, Xi,2 

= 1, and Xi ^3 

= 25 

Xi,l = 1, Xi,2 

= 0, and Xi ,3 

= 45 

Local quantile 

RQ 

LID 

Local quantile 

RQ 

LID 

0.1 

2.54 

2.44 

2.43 

2.89 

2.90 

2.88 

0.25 

2.81 

2.76 

2.75 

3.18 

3.17 

3.17 

0.5 

3.02 

3.07 

3.07 

3.54 

3.47 

3.46 
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Table 5. Out-of-bag quantile coverage 


Methods 

T = 0.1 

t = 0.25 

t = 0.5 

t = 0.75 

r = 0.9 

RQ 

0.100 

0.251 

0.504 

0.749 

0.895 

LID 

0.093 

0.249 

0.506 

0.748 

0.909 


5. Conclusion 

In this paper we proposed a Bayesian method for quantile regression which estimates mul¬ 
tiple quantiles simultaneously. We proved the convergence of the proposed algorithm, i.e., 
the stationary distribution of the Markov chain constructed by LID would converge to 
the target distribution as the number of quantiles m goes to inhnity. In the simulation 
studies, we found that choosing m = 15 already gave satisfactory results. In the compar¬ 
ison of the proposed LID method with other methods, LID provides comparable results 
for quantile estimation, and gives much better estimates of the difference of the quantiles 
than other methods (RQ, weighted RQ, and Yu and Moyeed’s method). 

The LID method is computationally intensive, and it requires longer time than other 
methods to obtain the results. Therefore, it is of interest to optimize LID to reduce the 
computational cost. 

The LID method uses m quantiles to construct an approximation to the likelihood 
through linear interpolation. For large m, it would be useful to impose regularization 
to make inference more efficient. We may assume that /3 (t) can be characterized by a 
few parameters, so we have a low-dimensional parameter space no matter what m is, 
and the computation of LID would simplify. On the other hand, this approach involves 
additional assumption or approximation which would require additional work for its 
theoretical justification. 


Appendix: Technical details 


A.l. Find the bounds for the proposal distribution 


This is for step 3 of the algorithm in Section 2.2. For each observation {yi,Xi), i = 
1,2, ...,n, we can calculate a lower bound and an upper bound Then = 

inaxi{ljg^i) is taken as the maximum of all these lower bounds and Uj^i = mini(uj^_i) is 
taken as the minimum of all these upper bounds. The formula to calculate and 
is given as follows. 

If 1 < j < m and Xi^i > 0, where Xi^i denotes the 1th element of Xi, then 




Ui,Lz = 


Xi,l 

Xil 


and 
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If 1 < j < m and Xij < 0, then 




and 


Xil 


If j = 1 and Xi^i > 0, then 


</3 ^{Tj-i)-J2tMXi,tPt (Xj) 


Xi,l 


= -oo and Ujj^i = 


Xil 


If j = I and Xi^i < 0, then 


hdd ~ 


xfl3>^-Hrj+i)-Et^iX.,tPrHr,) 


Xi,l 


and U4ii = Qo. 


li j = m and xty > 0, then 




hd.i ~ 


li j = m and Xiy < 0, then 


•-i y \ij 


t^Ll ‘ 


and 


Xiy 


— 00. 


lj,i,i = -oo and Ujy^t = 
If Xiy = 0, then 


Xil 


Ipi ^i — OO and — oo ■ 


A.2. Proof of Theorem 3.1 


We will verify the detailed balance condition to show that the stationary distribution is 
Pm{Bm\X,Y). Denote the probability of moving from Bm to B'^ by K{Bm B'^) and 
the proposal distribution by q{Bm B'^). We have 


Pm{Bm\X,Y)KiBm^B'J 


= PmiBm\X,Y)q{Bm “)■ B'^) uim 



7rrr,{B'jX)p,^{Y\X,B'Jq{B'^^ Bm) \ 
TTmiBm\X)pm{Y\X, B^)q{Bm B'^) J 


Trm{Bm\X)pm{Y\X,Bm) 

Pm(Y\X) 


q{B 

m -^B'^)mm[ 1 , 


7r^(i?;,|X)p^(r|X, B'JqjB'^ ^ Bra) \ 
7rm{Bm\X)p^{Y\X, Bm)qiBm B'^)j 
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_ '^miB^\X)pm{Y\X, B^) / ryf , ry \ ■ ( '^m{_Bm\X')PmiY\X, -^m) i 

MY\X) ™ ^>'^'\7:,^{B'JX)pr^{Y\X,BMB'^^By^y 

= Pm{B'JX,Y)K{B',^^ Bm). 

So the detailed balance condition is satisfied. 

A.3. Proof of Theorem 3.2 

To prove Theorem 3.2, we need three lemmas. 

Lemma A. 1. Let Pm{yi\dm,i) = fi{.yi\xi,Bm) given in (2.3). Assume Tj+i-Tj = 0{^). 
Then 

(a) \Pm{yi\Sm.,i)—p{yi\(^fi)\ = 0(-^) Uniformly in the support ofy as well as uniformly 
in i. 

(b) \pm{Y\X,Bm) - p{Y\X,Bm)\ =0(-^) uniformly in the support ofY. 


Proof, (a) We will prove this proposition in two different cases. 

Case 1: If yi is between two quantiles we are using, in which case we can find two 
consecutive quantiles qt^rj and qi,Tj+i such that yi € lqi,Tj ,qi,Tj+i), where 1 < j < m — 1, 
then by the mechanism of linear interpolation, we have the following equation 


Pm(yil0m,i) = 

Qi,rj 

_ _ '^i+1 ~ _ 

~ - F-\t,) 

_ _ '’'j+i ~ T)' _ 

(T;-1)'(t*)(Tj + i -Tj) 

__ 'Tj+i ~ T) _ 

i^/hiyl))iT-j+i - Tj) 

= My*), 

where r* e [Tj,Tj+i),y* € T* denotes the cdf of yi\0f, Fi{y*) = t*, and fi 

denotes the pdf of yi\0f. 

Now we want to show that 

\My*)- My^)\< sup My)- inf My)<xi 2 S, (A.i) 


where S = \/2(tj+i — Tj)fM 2 and M 2 is given in Assumption 3.1. If qi,Tj+i —Qi.Tj < <5, then 
IMy*)- My^)\ = \fliy^)iyi - y^)\ < M 2 S, where y^ G Now let us consider 
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the case that qi,Tj+i — qi,Tj > d. We will show that 


hi.y)<dy>Tj+i-T. 




(A.2) 


if 


sup My) 


inf /*(y)>M2(5. 

y6kcx,-.9..xj+i) 


(A.3) 


Letting yjnf = arginf^gj^^ /i(y), j/^up = argsup^^gj,.^^^ ,j. /^(j/), without loss 

of generality, we can assume that y-mi < J/sup- It is obvious that psup — 2/inf > d, because 

if 2/sup - 2/inf < d, then 


sup My) - inf My) = Mvsup) - Mvim) 

y6[gi,T-j :9«,Tj+i) ) 


= l/i(2/''')l(2/sup - 2/inf) < M 2 A 


(A.4) 


We can find a line with slope M 2 that goes through ( 2 /sup, /i (2/sup))- This line would be 

below the curve My) in [ 2 /inf,2/sup), since My) “ My sup) - /K2/^^)(2/-2/sup) >M2{y- 

2/sup) for y < 2/sup, which leads to Mv) > /f (2/sup) + M 2 {y - //sup)- 

Now we can check the area S formed by the line, y = //inf, 2/ = 2/sup, and fi{y) = 0. 
Figure 1 shows two possible cases. The shaded region is S. 

If My sup) - M 2 ( 2 /sup - 2/inf) > 0, the area is equal to 

[2/i(2/sup) M2(2/sup 2/inf)] (//sup 2/inf) ^ /i (//sup) (//sup 2/inf) 

2 - 2 




Figure 1. Illustration of the two possible cases of the area S: trapezoid and triangle. The solid 
curve stands for f{y). The dotted line stands for the line with slope M 2 . The shaded area is S. 
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> 


Mo (52 




(A.5) 


If /j(ysup) - M 2 {ysup - 2/inf) < 0, the area is equal to 


/,(y,,p)2 ^ (M 2 S) 


> 


2 M 2 2 M 2 

Therefore, in both cases, we have 




rUsup 

/ h{y)dy> My)dy>S>Tt+i-Tj, 

— Ti. Hence 


which contradicts with the fact that /j(y) dy = rj_^i — tj 

IMy*) - My^)\ < sup My)- inf My) 

ye [gi.T-j ,gi,T-j+i) ) 

< M 2 S = Y^2M2(Tj+i — Tj) 


(A.6) 


(A.7) 


= 0 


given that tj+i - = O(^). 

Now let us consider the second case. 

Case 2: If yi is a point in the tail, which means yi < qi^n or yi > then we have 
Piyil^fi) — fi{yi) < from Assumption 3.1. For the tail part, we can use a truncated 
normal for the interpolation so that Pm{yi\^m,i) < Therefore, we have |pm(2/i|^m,i) ~ 




' ^/rn' 


Thus for both Cases 1 and 2, we showed \'Pm{yi\(^m,i) —p[yi\()fi)\ = 0(^^). 
(b) Let us first show \pm{yi\(^m,i) - p{yi\xi,BMj\ = 0(:^)- 


\Pm{,yi\^m,i) p{,yi\Xi, BM}\ 




Pm {jji\^m,i) dll^^ 


,(/■)-/ 




P{yi\^h)dBe^^Xfi] 


< 




\Pm{yi\0m,i) - p{yi\0fi ) I dllg^ , (/i) 



Because PmMMMM) = ]Xi=iPm{yi\xi,BMj and p{Y\X,Bm) = Yri=iP{y^MMm), we 
can show \pMY\X,Bm) — p(YMMm)\ = 0('^) simply by induction. We will show 
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the case with n = 2 here. 


\pm{Y\X,B^)-p{Y\X,B^)\ 

— |Pm(yi 1^; 7?m)Pm(y2 1^5 i Bjji^p{lj2\X^ Bjyi^\ 

— IPm (yi 1^7 (y2 1^7 7?m) (yi |^7 (y2 |^7-^m) 

+ Pm(yi|^7 Sr„)7'(y2|-’^7 Bm) “ p{yi\X, Bm)p{y 2 \X, Bm)\ 

< \Pm{yi\X,Bm)\Pm{y2\X,B^) - p{y2\X,Bm)]\ 

+ \[Pm{yi\X,Bm)-p{yi\X,Bm)\p{y2\X,Bm)\ 


= 0 



where Mi is given in Assumption 3.1. The proof can be easily generalized to the case 
with n> 2. □ 


Lemma A.2. 

(a) E^^i\p^iY\X, B^) - p{Y\X, B^)\) = O(^). 

(b) ET^^{\pm{Y\X,Bjn) — Pm-i(L|A,i?m-i)|) = O('y^)- 

Proof. Part (a) of Lemma A.2 follows immediately from Lemma A. 1(b). Part (b) of 
Lemma A.2 can be obtained by applying Lemma A. 2(a) twice. □ 

Lemma A.3. \pmiY\X) - p(Y\X)\ = 0{^). 

Proof. 


|p„(F|A)-p(y|A)| 

= J 7Tm{B^\X)[pmiY\X,Bm)-p{Y\X,Bm)]dBm 
< J 7rmiB^\X)\pm(Y\X,Bm)-p(Y\X,Bm)\dBm 

= E^^i\p^(Y\X, B^) - p(Y\X, Bm)\) 



Now we are ready to prove Theorem 3.2. We have 

\\pm{Bm\X,Y)-p{Brn\X,Y)\\^^ 
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1 

= it sup 

^ | A |<1 

J h{Bm) 

/TTmiBrn\X)pm{Y\X,Brr,) 

'Km{B^\X)p{Y\X,Bm 

V Pm{Y\X) 

p{Y\X) 


j '^m{Bm\X) 

p^(Y\X,Bm) p{Y\X,Bm) 
Pm{Y\X) p{Y\X) 

dBm 


1 /■_ 

^{Bm\X) 

Pm{Y\X, Bm)p{Y\X) - pm{Y\X)p{Y\X, B^) 

dBm 

2 7 

p^{Y\X)plY\X) 

= ^ J TT„i{Bm\X) 





dBr, 


[PmiY\X, Bm) - p{Y\X, Bm)]p{Y\X) + pjYjX, BmMY\X) - Pm{Y\X)] 

p^iY\X)piY\X) 



TT^{Bm\X) 


\pm{Y\X,Bm) - piY\X,Brr^MY\X) +p{Y\X,Bm)\p{Y\X) - Pm{Y\X)\ 

Pm{Y\X)p{Y\X) 


l\E^^{\p^{Y\X,B^)-p{Y\X,Brn)\) 


2 


Pm{Y\X) 


\p^{Y\X)-p{Y\X)\ - 

PmiY\X) 


We already know from Lemma A.3 that Pm{Y\X) p{Y\X) as m —>■ oo, so for any 
e* G (0,p(y|Ar)), there exists an m* such that \pm(Y\X) —p{Y\X)\ < e* for m>m*. We 
can see that 


LB = mm{pmo{Y\X),p^,+^iY\X),... ,p^,_^{Y\X),p{Y\X) - e*) 

is a lower bound for pm{Y\X), where toq is the minimum number of quantiles we use. 
Therefore, \\p^iB,^\X,Y)-p{Bm\X,Y)\\Ty < ^[E^^(\p^(Y\X,Bm)-piY\X,Bm)\) + 
\pm{Y\X) — p(F|X)|] = 0(-^) —>■ 0 as m —>■ oo (because of Lemmas A.2 and A. 3). 
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